History ( forward n - gram ) or Future ( backward n - gram ) ? Which model to consider for n - gram analysis in Bangla ?
نویسندگان
چکیده
This paper presents a directional advantage of n-gram modeling in terms of backward or forward n-gram modeling in Bangla. The most commonly used n-gram analysis is predominantly a forward n-gram. However in Bangla it appears that a backward n-gram is repeatedly more successful and yields more grammatical results than a forward n-gram. This paper hypothesizes that the rationale behind this success is the syntactic ordering of constituents in Bangla. Bangla is a headfinal specifier-initial language as opposed to English, which is head-initial specifier-initial. Hence in Bangla, the head comes after its argument in a phrase. If an ngram analysis begins with a head and moves backwards it will stretch to its own argument but if you move forwards then you'll probably grab the argument of another head. As probability of occurrence of heads is higher, probability of depending on a head is also higher and hence a backward n-gram will probably have a greater chance of yielding grammatical results. We carried out several experiments to compare different directional results in different applications with an advantage in the backward direction. This will prove a useful linguistic insight in terms of n-gram based analysis depending upon variations of constituent analysis.
منابع مشابه
An Efficient Phone N-Gram Forward-Backward Computation Using Dense Matrix Multiplication
The forward-backward algorithm is commonly used to train neural network acoustic models when optimizing a sequence objective like MMI and sMBR. Recent work on lattice-free MMI training of neural network acoustic models shows that the forward-backward algorithm can be computed efficiently in the probability domain as a series of sparse matrix multiplications using GPUs. In this paper, we present...
متن کاملBacterial Endocarditis and Periodontal Disease
Bacterial endocarditis is the infection of inner lining of heart and /or heart valves. This disease is usually related to the presence of some pathogenic bacteria in mouth, digestive system or urinary tract. Most of the times, this infection happens in people with heart problems like the presence of prosthetic valves, history of previous endocarditis, some congenital heart defects and heart tra...
متن کاملAnalysis of Eye Movements and Linguistic Boundaries in a Text for the Investigation of Japanese Reading Processes
SUMMARY In order to investigate reading processes of Japanese language learners, we have conducted an experiment to record eye movements during Japanese text reading using an eye-tracking system. We showed that Japanese native speakers use " forward and backward jumping eye movements " frequently [13], [14]. In this paper, we analyzed further the same eye tracking data. Our goal is to examine w...
متن کاملDynamic Programming for NLP
The strategy of dynamic programming reduces the complexity of a search problem which decomposes into frequently-reused instances of the same problem. You encountered dynamic programming for n-gram segmentation in HW4. We have also discussed two more dynamic programming algorithms in lecture: Viterbi and forward-backward. For HW5, you will need to implement the Viterbi algorithm. The forward-bac...
متن کاملComparison of different POS Tagging Techniques (N-Gram, HMM and Brill’s tagger) for Bangla
There are different approaches to the problem of assigning each word of a text with a parts-of-speech tag, which is known as Part-Of-Speech (POS) tagging. In this paper we compare the performance of a few POS tagging techniques for Bangla language, e.g. statistical approach (n-gram, HMM) and transformation based approach (Brill’s tagger). A supervised POS tagging approach requires a large amoun...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006